NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HAN: a Hierarchical AutotuNed Collective Communication Framework

https://doi.org/10.1109/CLUSTER49012.2020.00013

Luo, Xi; Wu, Wei; Bosilca, George; Pei, Yu; Cao, Qinglei; Patinyasakdikul, Thananon; Zhong, Dong; Dongarra, Jack (September 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER))

Full Text Available
Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information

https://doi.org/10.1145/3127024.3127039

Eberius, David; Patinyasakdikul, Thananon; Bosilca, George (September 2017, EuroMPI)

This paper details the implementation and usage of software-based performance counters to understand the performance of a particular implementation of the MPI standard, Open MPI. Such counters can expose intrinsic features of the software stack that are not available otherwise in a generic and portable way. The PMPI-interface is useful for instrumenting MPI applications at a user level, however it is insufficient for providing meaningful internal MPI performance details. While the Peruse interface provides more detailed information on state changes within Open MPI, it has not seen widespread adoption. We introduce a simple low-level approach that instruments the Open MPI code at key locations to provide fine-grained MPI performance metrics. We evaluate the overhead associated with adding these counters to Open MPI as well as their use in determining bottlenecks and areas for improvement both in user code and the MPI implementation itself.
more » « less
Full Text Available
ADAPT: An Event-Based Adaptive Collective Communication Framework

https://doi.org/10.1145/3208040.3208054

Luo, Xi; Wu, Wei; Bosilca, George; Patinyasakdikul, Thananon; Wang, Linnan; Dongarra, Jack (June 2018, The 27th International Symposium on High-Performance Parallel and Distributed Computing)

The increase in scale and heterogeneity of high-performance computing (HPC) systems predispose the performance of Message Passing Interface (MPI) collective communications to be susceptible to noise, and to adapt to a complex mix of hardware capabilities. The designs of state of the art MPI collectives heavily rely on synchronizations; these designs magnify noise across the participating processes, resulting in significant performance slowdown. Therefore, such design philosophy must be reconsidered to efficiently and robustly run on the large-scale heterogeneous platforms. In this paper, we present ADAPT, a new collective communication framework in Open MPI, using event-driven techniques to morph collective algorithms to heterogeneous environments. The core concept of ADAPT is to relax synchronizations, while mamtaining the minimal data dependencies of MPI collectives. To fully exploit the different bandwidths of data movement lanes in heterogeneous systems, we extend the ADAPT collective framework with a topology-aware communication tree. This removes the boundaries of different hardware topologies while maximizing the speed of data movements. We evaluate our framework with two popular collective operations: broadcast and reduce on both CPU and GPU clusters. Our results demonstrate drastic performance improvements and a strong resistance against noise compared to other state of the art MPI libraries. In particular, we demonstrate at least 1.3X and 1.5X speedup for CPU data and 2X and 10X speedup for GPU data using ADAPT event-based broadcast and reduce operations.
more » « less
Full Text Available

Search for: All records